Genome Sizes and the Benford Distribution

نویسندگان

  • James L. Friar
  • Terrance Goldman
  • Juan Pérez–Mercader
چکیده

BACKGROUND Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show the presence of some notable general features. These include essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with total genome size for the former, but only logarithmically for the latter. RESULTS Simply by assuming that the (protein) coding and non-coding fractions of the genome must have different dynamics and that the non-coding fraction must be particularly versatile and therefore be controlled by a variety of (unspecified) probability distribution functions (pdf's), we are able to predict that the number of ORFs for Eukaryotes follows a Benford distribution and must therefore have a specific logarithmic form. Using the data for the 1000+ genomes available to us in early 2010, we find that the Benford distribution provides excellent fits to the data over several orders of magnitude. CONCLUSIONS In its linear regime the Benford distribution produces excellent fits to the Prokaryote data, while the full non-linear form of the distribution similarly provides an excellent fit to the Eukaryote data. Furthermore, in their region of overlap the salient features are statistically congruent. This allows us to interpret the difference between Prokaryotes and Eukaryotes as the manifestation of the increased demand in the biological functions required for the larger Eukaryotes, to estimate some minimal genome sizes, and to predict a maximal Prokaryote genome size on the order of 8-12 megabasepairs. These results naturally allow a mathematical interpretation in terms of maximal entropy and, therefore, most efficient information transmission.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Benford Distribution and its Applications

‎This papers considers the Benford distribution as one the interesting rules of probability theory‎. ‎This rule shows that‎, ‎despite of usual underestanding about the uniform distribution of digites‎ ‎for lieing on different positions in numbers‎, ‎digits are distributed according to an non uniform distributions for numbers araising from many natural phonemena‎. ‎Benford distribution proposes ...

متن کامل

Detecting Fraud Using Modified Benford Analysis

Large enterprises frequently enforce accounting limits to reduce the impact of fraud. As a complement to accounting limits, auditors use Benford analysis to detect traces of undesirable or illegal activities in accounting data. Unfortunately, the two fraud fighting measures often do not work well together. Accounting limits may significantly disturb the digit distribution examined by Benford an...

متن کامل

Order Statistics and Shifted Almost Benford Behavior

Fix a base B and let ζ have the standard exponential distribution; the distribution of digits of ζ base B is known to be very close to Benford’s Law. If there exists a C such that the distribution of digits of C times the elements of the system is the same as that of ζ, we say the system exhibits Shifted Almost Benford behavior base B (with a shift of logB C mod 1). Let X1, . . . ,XN be indepen...

متن کامل

Generalizing Benfords Law Using Power Laws: Application to Integer Sequences

A simple method to derive parametric analytical extensions of Benford’s law for first digits of numerical data is proposed. Two generalized Benford distributions are considered, namely the two-sided power Benford (TSPB) distribution, which has been introduced in Hürlimann(2003), and the new Pareto Benford (PB) distribution. Based on the minimum chisquare estimators, the fitting capabilities of ...

متن کامل

Differences between Independent Variables and Almost Benford Behavior

Fix a base B and let X1, . . . , XN be independent identically distributed random variables. If the Xi’s are drawn from a uniform distribution, then as N → ∞ the distribution of the digits of the differences between adjacent Xi’s tends to a universal distribution which is almost Benford’s Law; we call this Almost Benford behavior. For each base we develop a rapidly convergent Fourier series exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012